Loan Data from Prosper

by Thao Ha (Katie)

This project is a part of Data Analyst Nanodegree program.

Introduction

In this exercise, I will explore a financial data set “Prosper Loans”. Prosper is an American peer-to-peer lending company that offers personal loans at low rates. These loans are unsecured, which mean you do not have any put up and collateral (like a house or a car) that could get taken away if you can’t make payments. Each loan is typically funded by multiple people over the United States. In this way, Prosper is a marketplace connecting those who need a loan to those who have extra money to lend. Throughout this dataset, I will explore the patterns between 81 variables and 113937 observations on each loan of Prosper.

Summary Statistic

Firstly, I am running some basic functions to examine the structure and schema of the data set

## 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Factor w/ 113064 levels "2005-11-09 20:44:28.847000000",..: 14184 111894 6429 64760 85967 100310 72556 74019 97834 97834 ...
##  $ CreditGrade                        : Factor w/ 9 levels "","A","AA","B",..: 5 1 8 1 1 1 1 1 1 1 ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Factor w/ 2803 levels "","2005-11-25 00:00:00",..: 1138 1 1263 1 1 1 1 1 1 1 ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating..numeric.            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating..Alpha.              : Factor w/ 8 levels "","A","AA","B",..: 1 2 1 2 6 4 7 5 3 3 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory..numeric.          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
##  $ Occupation                         : Factor w/ 68 levels "","Accountant/CPA",..: 37 43 37 52 21 43 50 29 24 24 ...
##  $ EmploymentStatus                   : Factor w/ 9 levels "","Employed",..: 9 2 4 2 2 2 2 2 2 2 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 707 levels "","00343376901312423168731",..: 1 1 335 1 1 1 1 1 1 1 ...
##  $ DateCreditPulled                   : Factor w/ 112992 levels "2005-11-09 00:30:04.487000000",..: 14347 111883 6446 64724 85857 100382 72500 73937 97888 97888 ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Factor w/ 11586 levels "","1947-08-24 00:00:00",..: 8639 6617 8927 2247 9498 497 8265 7685 5543 5543 ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent..percentage. : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Factor w/ 1873 levels "2005-11-15 00:00:00",..: 426 1866 260 1535 1757 1821 1649 1666 1813 1813 ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...
##                    ListingKey     ListingNumber    
##  17A93590655669644DB4C06:     6   Min.   :      4  
##  349D3587495831350F0F648:     4   1st Qu.: 400919  
##  47C1359638497431975670B:     4   Median : 600554  
##  8474358854651984137201C:     4   Mean   : 627886  
##  DE8535960513435199406CE:     4   3rd Qu.: 892634  
##  04C13599434217079754AEE:     3   Max.   :1255725  
##  (Other)                :113912                    
##                     ListingCreationDate  CreditGrade         Term      
##  2013-10-02 17:20:16.550000000:     6          :84984   Min.   :12.00  
##  2013-08-28 20:31:41.107000000:     4   C      : 5649   1st Qu.:36.00  
##  2013-09-08 09:27:44.853000000:     4   D      : 5153   Median :36.00  
##  2013-12-06 05:43:13.830000000:     4   B      : 4389   Mean   :40.83  
##  2013-12-06 11:44:58.283000000:     4   AA     : 3509   3rd Qu.:36.00  
##  2013-08-21 07:25:22.360000000:     3   HR     : 3508   Max.   :60.00  
##  (Other)                      :113912   (Other): 6745                  
##                  LoanStatus                  ClosedDate   
##  Current              :56576                      :58848  
##  Completed            :38074   2014-03-04 00:00:00:  105  
##  Chargedoff           :11992   2014-02-19 00:00:00:  100  
##  Defaulted            : 5018   2014-02-11 00:00:00:   92  
##  Past Due (1-15 days) :  806   2012-10-30 00:00:00:   81  
##  Past Due (31-60 days):  363   2013-02-26 00:00:00:   78  
##  (Other)              : 1108   (Other)            :54633  
##   BorrowerAPR       BorrowerRate     LenderYield     
##  Min.   :0.00653   Min.   :0.0000   Min.   :-0.0100  
##  1st Qu.:0.15629   1st Qu.:0.1340   1st Qu.: 0.1242  
##  Median :0.20976   Median :0.1840   Median : 0.1730  
##  Mean   :0.21883   Mean   :0.1928   Mean   : 0.1827  
##  3rd Qu.:0.28381   3rd Qu.:0.2500   3rd Qu.: 0.2400  
##  Max.   :0.51229   Max.   :0.4975   Max.   : 0.4925  
##  NA's   :25                                          
##  EstimatedEffectiveYield EstimatedLoss   EstimatedReturn 
##  Min.   :-0.183          Min.   :0.005   Min.   :-0.183  
##  1st Qu.: 0.116          1st Qu.:0.042   1st Qu.: 0.074  
##  Median : 0.162          Median :0.072   Median : 0.092  
##  Mean   : 0.169          Mean   :0.080   Mean   : 0.096  
##  3rd Qu.: 0.224          3rd Qu.:0.112   3rd Qu.: 0.117  
##  Max.   : 0.320          Max.   :0.366   Max.   : 0.284  
##  NA's   :29084           NA's   :29084   NA's   :29084   
##  ProsperRating..numeric. ProsperRating..Alpha.  ProsperScore  
##  Min.   :1.000                  :29084         Min.   : 1.00  
##  1st Qu.:3.000           C      :18345         1st Qu.: 4.00  
##  Median :4.000           B      :15581         Median : 6.00  
##  Mean   :4.072           A      :14551         Mean   : 5.95  
##  3rd Qu.:5.000           D      :14274         3rd Qu.: 8.00  
##  Max.   :7.000           E      : 9795         Max.   :11.00  
##  NA's   :29084           (Other):12307         NA's   :29084  
##  ListingCategory..numeric. BorrowerState  
##  Min.   : 0.000            CA     :14717  
##  1st Qu.: 1.000            TX     : 6842  
##  Median : 1.000            NY     : 6729  
##  Mean   : 2.774            FL     : 6720  
##  3rd Qu.: 3.000            IL     : 5921  
##  Max.   :20.000                   : 5515  
##                            (Other):67493  
##                     Occupation         EmploymentStatus
##  Other                   :28617   Employed     :67322  
##  Professional            :13628   Full-time    :26355  
##  Computer Programmer     : 4478   Self-employed: 6134  
##  Executive               : 4311   Not available: 5347  
##  Teacher                 : 3759   Other        : 3806  
##  Administrative Assistant: 3688                : 2255  
##  (Other)                 :55456   (Other)      : 2718  
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.00           False:56459         False:101218    
##  1st Qu.: 26.00           True :57478         True : 12719    
##  Median : 67.00                                               
##  Mean   : 96.07                                               
##  3rd Qu.:137.00                                               
##  Max.   :755.00                                               
##  NA's   :7625                                                 
##                     GroupKey                 DateCreditPulled 
##                         :100596   2013-12-23 09:38:12:     6  
##  783C3371218786870A73D20:  1140   2013-11-21 09:09:41:     4  
##  3D4D3366260257624AB272D:   916   2013-12-06 05:43:16:     4  
##  6A3B336601725506917317E:   698   2014-01-14 20:17:49:     4  
##  FEF83377364176536637E50:   611   2014-02-09 12:14:41:     4  
##  C9643379247860156A00EC0:   342   2013-09-27 22:04:54:     3  
##  (Other)                :  9634   (Other)            :113912  
##  CreditScoreRangeLower CreditScoreRangeUpper
##  Min.   :  0.0         Min.   : 19.0        
##  1st Qu.:660.0         1st Qu.:679.0        
##  Median :680.0         Median :699.0        
##  Mean   :685.6         Mean   :704.6        
##  3rd Qu.:720.0         3rd Qu.:739.0        
##  Max.   :880.0         Max.   :899.0        
##  NA's   :591           NA's   :591          
##         FirstRecordedCreditLine CurrentCreditLines OpenCreditLines
##                     :   697     Min.   : 0.00      Min.   : 0.00  
##  1993-12-01 00:00:00:   185     1st Qu.: 7.00      1st Qu.: 6.00  
##  1994-11-01 00:00:00:   178     Median :10.00      Median : 9.00  
##  1995-11-01 00:00:00:   168     Mean   :10.32      Mean   : 9.26  
##  1990-04-01 00:00:00:   161     3rd Qu.:13.00      3rd Qu.:12.00  
##  1995-03-01 00:00:00:   159     Max.   :59.00      Max.   :54.00  
##  (Other)            :112389     NA's   :7604       NA's   :7604   
##  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   :  2.00             Min.   : 0.00        
##  1st Qu.: 17.00             1st Qu.: 4.00        
##  Median : 25.00             Median : 6.00        
##  Mean   : 26.75             Mean   : 6.97        
##  3rd Qu.: 35.00             3rd Qu.: 9.00        
##  Max.   :136.00             Max.   :51.00        
##  NA's   :697                                     
##  OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries   
##  Min.   :    0.0             Min.   :  0.000      Min.   :  0.000  
##  1st Qu.:  114.0             1st Qu.:  0.000      1st Qu.:  2.000  
##  Median :  271.0             Median :  1.000      Median :  4.000  
##  Mean   :  398.3             Mean   :  1.435      Mean   :  5.584  
##  3rd Qu.:  525.0             3rd Qu.:  2.000      3rd Qu.:  7.000  
##  Max.   :14985.0             Max.   :105.000      Max.   :379.000  
##                              NA's   :697          NA's   :1159     
##  CurrentDelinquencies AmountDelinquent   DelinquenciesLast7Years
##  Min.   : 0.0000      Min.   :     0.0   Min.   : 0.000         
##  1st Qu.: 0.0000      1st Qu.:     0.0   1st Qu.: 0.000         
##  Median : 0.0000      Median :     0.0   Median : 0.000         
##  Mean   : 0.5921      Mean   :   984.5   Mean   : 4.155         
##  3rd Qu.: 0.0000      3rd Qu.:     0.0   3rd Qu.: 3.000         
##  Max.   :83.0000      Max.   :463881.0   Max.   :99.000         
##  NA's   :697          NA's   :7622       NA's   :990            
##  PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
##  Min.   : 0.0000          Min.   : 0.000            Min.   :      0       
##  1st Qu.: 0.0000          1st Qu.: 0.000            1st Qu.:   3121       
##  Median : 0.0000          Median : 0.000            Median :   8549       
##  Mean   : 0.3126          Mean   : 0.015            Mean   :  17599       
##  3rd Qu.: 0.0000          3rd Qu.: 0.000            3rd Qu.:  19521       
##  Max.   :38.0000          Max.   :20.000            Max.   :1435667       
##  NA's   :697              NA's   :7604              NA's   :7604          
##  BankcardUtilization AvailableBankcardCredit  TotalTrades    
##  Min.   :0.000       Min.   :     0          Min.   :  0.00  
##  1st Qu.:0.310       1st Qu.:   880          1st Qu.: 15.00  
##  Median :0.600       Median :  4100          Median : 22.00  
##  Mean   :0.561       Mean   : 11210          Mean   : 23.23  
##  3rd Qu.:0.840       3rd Qu.: 13180          3rd Qu.: 30.00  
##  Max.   :5.950       Max.   :646285          Max.   :126.00  
##  NA's   :7604        NA's   :7544            NA's   :7544    
##  TradesNeverDelinquent..percentage. TradesOpenedLast6Months
##  Min.   :0.000                      Min.   : 0.000         
##  1st Qu.:0.820                      1st Qu.: 0.000         
##  Median :0.940                      Median : 0.000         
##  Mean   :0.886                      Mean   : 0.802         
##  3rd Qu.:1.000                      3rd Qu.: 1.000         
##  Max.   :1.000                      Max.   :20.000         
##  NA's   :7544                       NA's   :7544           
##  DebtToIncomeRatio         IncomeRange    IncomeVerifiable
##  Min.   : 0.000    $25,000-49,999:32192   False:  8669    
##  1st Qu.: 0.140    $50,000-74,999:31050   True :105268    
##  Median : 0.220    $100,000+     :17337                   
##  Mean   : 0.276    $75,000-99,999:16916                   
##  3rd Qu.: 0.320    Not displayed : 7741                   
##  Max.   :10.010    $1-24,999     : 7274                   
##  NA's   :8554      (Other)       : 1427                   
##  StatedMonthlyIncome                    LoanKey       TotalProsperLoans
##  Min.   :      0     CB1B37030986463208432A1:     6   Min.   :0.00     
##  1st Qu.:   3200     2DEE3698211017519D7333F:     4   1st Qu.:1.00     
##  Median :   4667     9F4B37043517554537C364C:     4   Median :1.00     
##  Mean   :   5608     D895370150591392337ED6D:     4   Mean   :1.42     
##  3rd Qu.:   6825     E6FB37073953690388BC56D:     4   3rd Qu.:2.00     
##  Max.   :1750003     0D8F37036734373301ED419:     3   Max.   :8.00     
##                      (Other)                :113912   NA's   :91852    
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:  9.00             1st Qu.:  9.00       
##  Median : 16.00             Median : 15.00       
##  Mean   : 22.93             Mean   : 22.27       
##  3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :141.00             Max.   :141.00       
##  NA's   :91852              NA's   :91852        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.61                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :91852                       NA's   :91852                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6000            Median : 1627              
##  Mean   : 8472            Mean   : 2930              
##  3rd Qu.:11000            3rd Qu.: 4127              
##  Max.   :72499            Max.   :23451              
##  NA's   :91852            NA's   :91852              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.00             Min.   :   0.0           
##  1st Qu.: -35.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -3.22             Mean   : 152.8           
##  3rd Qu.:  25.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :2704.0           
##  NA's   :95009                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   :  0.0              Min.   :     1  
##  1st Qu.: 9.00                 1st Qu.:  6.0              1st Qu.: 37332  
##  Median :14.00                 Median : 21.0              Median : 68599  
##  Mean   :16.27                 Mean   : 31.9              Mean   : 69444  
##  3rd Qu.:22.00                 3rd Qu.: 65.0              3rd Qu.:101901  
##  Max.   :44.00                 Max.   :100.0              Max.   :136486  
##  NA's   :96985                                                            
##  LoanOriginalAmount          LoanOriginationDate LoanOriginationQuarter
##  Min.   : 1000      2014-01-22 00:00:00:   491   Q4 2013:14450         
##  1st Qu.: 4000      2013-11-13 00:00:00:   490   Q1 2014:12172         
##  Median : 6500      2014-02-19 00:00:00:   439   Q3 2013: 9180         
##  Mean   : 8337      2013-10-16 00:00:00:   434   Q2 2013: 7099         
##  3rd Qu.:12000      2014-01-28 00:00:00:   339   Q3 2012: 5632         
##  Max.   :35000      2013-09-24 00:00:00:   316   Q2 2012: 5061         
##                     (Other)            :111428   (Other):60343         
##                    MemberKey      MonthlyLoanPayment LP_CustomerPayments
##  63CA34120866140639431C9:     9   Min.   :   0.0     Min.   :   -2.35   
##  16083364744933457E57FB9:     8   1st Qu.: 131.6     1st Qu.: 1005.76   
##  3A2F3380477699707C81385:     8   Median : 217.7     Median : 2583.83   
##  4D9C3403302047712AD0CDD:     8   Mean   : 272.5     Mean   : 4183.08   
##  739C338135235294782AE75:     8   3rd Qu.: 371.6     3rd Qu.: 5548.40   
##  7E1733653050264822FAA3D:     8   Max.   :2251.5     Max.   :40702.39   
##  (Other)                :113888                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :   -2.35   Min.   :-664.87  
##  1st Qu.:  500.9              1st Qu.:  274.87   1st Qu.: -73.18  
##  Median : 1587.5              Median :  700.84   Median : -34.44  
##  Mean   : 3105.5              Mean   : 1077.54   Mean   : -54.73  
##  3rd Qu.: 4000.0              3rd Qu.: 1458.54   3rd Qu.: -13.92  
##  Max.   :35000.0              Max.   :15617.03   Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :  -94.2       Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -14.24   Mean   :  700.4       Mean   :  681.4    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations   
##  Min.   :    0.00                Min.   :0.7000   Min.   : 0.00000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.00000  
##  Median :    0.00                Median :1.0000   Median : 0.00000  
##  Mean   :   25.14                Mean   :0.9986   Mean   : 0.04803  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.00000  
##  Max.   :21117.90                Max.   :1.0125   Max.   :39.00000  
##                                                                     
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   : 0.00000           Min.   :    0.00            Min.   :   1.00  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.:   2.00  
##  Median : 0.00000           Median :    0.00            Median :  44.00  
##  Mean   : 0.02346           Mean   :   16.55            Mean   :  80.48  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.: 115.00  
##  Max.   :33.00000           Max.   :25000.00            Max.   :1189.00  
## 
##  [1] "ListingKey"                         
##  [2] "ListingNumber"                      
##  [3] "ListingCreationDate"                
##  [4] "CreditGrade"                        
##  [5] "Term"                               
##  [6] "LoanStatus"                         
##  [7] "ClosedDate"                         
##  [8] "BorrowerAPR"                        
##  [9] "BorrowerRate"                       
## [10] "LenderYield"                        
## [11] "EstimatedEffectiveYield"            
## [12] "EstimatedLoss"                      
## [13] "EstimatedReturn"                    
## [14] "ProsperRating..numeric."            
## [15] "ProsperRating..Alpha."              
## [16] "ProsperScore"                       
## [17] "ListingCategory..numeric."          
## [18] "BorrowerState"                      
## [19] "Occupation"                         
## [20] "EmploymentStatus"                   
## [21] "EmploymentStatusDuration"           
## [22] "IsBorrowerHomeowner"                
## [23] "CurrentlyInGroup"                   
## [24] "GroupKey"                           
## [25] "DateCreditPulled"                   
## [26] "CreditScoreRangeLower"              
## [27] "CreditScoreRangeUpper"              
## [28] "FirstRecordedCreditLine"            
## [29] "CurrentCreditLines"                 
## [30] "OpenCreditLines"                    
## [31] "TotalCreditLinespast7years"         
## [32] "OpenRevolvingAccounts"              
## [33] "OpenRevolvingMonthlyPayment"        
## [34] "InquiriesLast6Months"               
## [35] "TotalInquiries"                     
## [36] "CurrentDelinquencies"               
## [37] "AmountDelinquent"                   
## [38] "DelinquenciesLast7Years"            
## [39] "PublicRecordsLast10Years"           
## [40] "PublicRecordsLast12Months"          
## [41] "RevolvingCreditBalance"             
## [42] "BankcardUtilization"                
## [43] "AvailableBankcardCredit"            
## [44] "TotalTrades"                        
## [45] "TradesNeverDelinquent..percentage." 
## [46] "TradesOpenedLast6Months"            
## [47] "DebtToIncomeRatio"                  
## [48] "IncomeRange"                        
## [49] "IncomeVerifiable"                   
## [50] "StatedMonthlyIncome"                
## [51] "LoanKey"                            
## [52] "TotalProsperLoans"                  
## [53] "TotalProsperPaymentsBilled"         
## [54] "OnTimeProsperPayments"              
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"    
## [57] "ProsperPrincipalBorrowed"           
## [58] "ProsperPrincipalOutstanding"        
## [59] "ScorexChangeAtTimeOfListing"        
## [60] "LoanCurrentDaysDelinquent"          
## [61] "LoanFirstDefaultedCycleNumber"      
## [62] "LoanMonthsSinceOrigination"         
## [63] "LoanNumber"                         
## [64] "LoanOriginalAmount"                 
## [65] "LoanOriginationDate"                
## [66] "LoanOriginationQuarter"             
## [67] "MemberKey"                          
## [68] "MonthlyLoanPayment"                 
## [69] "LP_CustomerPayments"                
## [70] "LP_CustomerPrincipalPayments"       
## [71] "LP_InterestandFees"                 
## [72] "LP_ServiceFees"                     
## [73] "LP_CollectionFees"                  
## [74] "LP_GrossPrincipalLoss"              
## [75] "LP_NetPrincipalLoss"                
## [76] "LP_NonPrincipalRecoverypayments"    
## [77] "PercentFunded"                      
## [78] "Recommendations"                    
## [79] "InvestmentFromFriendsCount"         
## [80] "InvestmentFromFriendsAmount"        
## [81] "Investors"

Univariate Plots Section

Prosper lending drop significantly in 2009 due to their relaunch that Prosper became a lending platform focus on prime and nearly prime borrowers. Then, they started to scale again and grew incredibly in 2013 proving that online lending is a potential investment channel for both investors as well as borrowers.

The P2P lending platform is geographically restricted, not all states are opened to Prosper loan. States such as California, Florida, Georgia, Illinois, New York, Ohio and Texas account for the largest share of loans.

Most borrowers joined Prosper loan for the reasons including debt consolidation, home improvement, business, personal loan, auto and other. Meanwhile, the most popular purpose of Prosper loan is to pay off credit card debt, as debt consolidation. This figure exceeds all of the others, reaching almost 60,000 of loans. This trend is usual since Prosper loan could help borrowers saving a lot of money from other loans, and most importantly customer’s credit score could increase shortly after doing the consolidation, especially when your credit utilization ratio is getting hurt itself.

Let’s now take a look at the distribution of each metric on loans made over the past several years.

Prosper Rating is the letter (AA - HR) that assigned to a borrower. This is a proprietary system that similar to a credit score that it is predictive of the likelihood of loan default. Prosper uses this rating in setting the pricing on your loan.

Prosper Score is a customer risk score was build using historical Prosper data to access the risk of Prosper borrower listing. Ranging from 1 to 11, with 11 being the best (or lower risk) score, the worst (or higher risk) score is a 1.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   660.0   680.0   685.6   720.0   880.0     591

Credit Score, Prosper Rating and Prosper Score is the three numbers that indicate the health of credit of a loan. As you might find through the plot, the bulk of borrowers lie among type A, B, C and D of Prosper Rating and between a range of Prosper Score from 4 to 8. Meanwhile, the majority of loans is those with rate C and score 5. Furthermore, the lower bound credit score of those who acquired most of the installment loan is between 620 and 700. Even though Prosper personal loan requires borrowers to have a minimum credit score of 640 in order to qualify for a loan, there are still some individuals having a credit score which is less than 640 sitting on the list.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1928  0.2500  0.4975

The mean interest rate for all Prosper loans is fairly substantial at 19.28%. Interestingly, the volume is considerably high at the rate of 32%, proving that many investors are interested in higher risk investment which corresponds to the higher interest rate in return.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.0100  0.1242  0.1730  0.1827  0.2400  0.4925

Lender yield is equal to the interest rate on the loan less the servicing fee paid by the borrowers. This is also one of the most important inputs into any return calculation.

Borrower’s Financial Information

Almost a half of loan count are those with income ranging from 25,000 to 75,000 USD. Loans offered through Prosper only range from 2,000 to 35,000 USD so this number would make sense since people who earn a higher wage at almost 80,000 to 100,000 are less likely to loan money than those earning a lower income. On the other hand, these people might be young adult beginning to start their career at the junior level.

Again we see, people who have a monthly income ranging from 2000 to 8000 USD are those who much more in need of a personal loan.

A vast majority of borrowers is those who employed or has a full-time job. This number demonstrates that they are eligible to pay off the loan and well meet the requirement to issue one. However, those who retired and not employed borrow more money than those having a part-time job. This could be explained by the fact that part-time workers are more likely to be students, while retired and not-employed people are professionals and they have more years of experience in earning money, those who get more responsibility in life and are more likely to be in need of a loan.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   2.167   5.583   8.006  11.417  62.917    7625

The data is positively skewed and long-tailed. Employed duration has a median of 5.5 years and a mean of 8 years which is indicating that young professionals are the main customer of Prosper personal loan.

Debt-to-income ratio smaller than 36% is preferable. Most of Prosper loans had DTI ratio approximately around 40% to 70%. The lower the number, the better the chance that an individual will be able to get loans.

People lent a loan amount between 1,000 and 10,000 USD most of the time. They tend to request for a loan in a regular number such as 10,000, 15,000, 20,000, 25,000 USD and they are less likely to request for a number in between.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0   131.6   217.7   272.5   371.6  2251.5

After removing 0.01 percent of the outlier we have a clearer picture of this figure. On average, the median and mean of the amount of money one has to pay each month are 217 USD and 277 USD respectively.

The higher the score of credit card utilization, the more people opt for a personal loan. The general rule of thumb is to keep your balance at or below 33% of your total available credit to secure your credit score. Borrowers using 66% or more of their available credit might also bring a higher risk on investment.

Borrowers can repeat on their loan of Prosper and many of them have 1 loan previously.

Prosper loan offers three specific loan term of 1, 3 and 5 years. It’s fascinating to figure out that more than 50% of total borrowers signing up for the 3-year term loan. Meanwhile, 1-year term loan seems to be much less attractive comparing to the others.

As we can see, there is a remarkable change in the number of investors in the year 2013 and 2014. Distinct investor, who actually invest in the whole loan, increase considerably. In comparison with a loan funded by multiple investors, a single funded loan made up quite a larger amount in the graph. This trend is respected to grow strongly in the near future.

Univariate Analysis

What is the structure of your dataset?

The original dataset contained 113,937 loan records with 81 variables to examine. Key variables based on my analysis are divided into three terms: borrower’s credit information (Credit Grade, Prosper Rating, Prosper Score), borrower’s financial health (Debt to Income Ratio, Bankcard Utilization, Current credit line) and estimated investment return (Lender Yield, Borrower Rate, number of investors)

Other observation: * Nearly 25% of investors funded the whole loan for the given period and this trend grew significantly to 50% of investors in 2013 and 2014. * Most loans are issued to pay off debt consolidation. * The median borrower rate is 18% and the maximum rate is 49.75% * 50% of Prosper loan borrowers have less than 6 years of working experience. * The median Debt to Income ratio is 22%

What is/are the main feature(s) of interest in your dataset?

Firstly, I’m interested in Prosper Rating and Prosper Score, which are the indicators of borrower’s credit history and the tool predicting loan price as well as the likelihood of default. From another point of view, Borrower Rate and/or Lender Yield are especially in most concern for lenders, since these figures are capable of anticipating returned profits to some degree. Digging deeper into the dataset, I would like to switch myself to the perspective of an investor trying to examine the relationship between borrower profile along with estimated return/loss and default chance on each loan.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

Up to now, I thought Debt to Income ratio, Employment duration and Monthly Payment on loan might have some meaningful relationships with Prosper Rating and Score. Loan term is another factor needed to take into account and I thought it relates closely to loan purpose and credit health, which I will explore later on. The number of loans funded by single investors is an intriguing figure since this trend increased incredibly in the last two years. Analyzing credit grade, lender yield, borrower rate, number of delinquency could help comprehend this trend much deeper as a point of view of an investor.

Did you create any new variables from existing variables in the dataset?

I created a new variable for loan origination in month and year, and factorize variable of loan status.

Of the features you investigated, were there any unusual distributions?  Did you perform any operations on the data to tidy, adjust, or change the form  of the data? If so, why did you do this?

Yes, there are. Borrower rate at 32% is exceptionally higher in volume than it’s nearest rates such as 30% and 34%. The same observation that found in lender yield distribution. Additionally, borrowers are likely to stick with 3-year term loan most of the time, I thought we could investigate more on this observation to figure out why it’s the trend. I did transform some variables into sequential categories since this will help the code run more quickly and give us more of a better view of the distribution of those variables. I also transformed DateTime variables appropriately to extract exact month and year of each transaction.

Bivariate Plots Section

## 
##  Pearson's product-moment correlation
## 
## data:  ProsperRating..numeric. and LenderYield
## t = -917.52, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.9537315 -0.9524993
## sample estimates:
##        cor 
## -0.9531194
## 
##  Pearson's product-moment correlation
## 
## data:  ProsperRating..numeric. and EstimatedLoss
## t = -1058.9, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.9646522 -0.9637054
## sample estimates:
##        cor 
## -0.9641819

In those analyses, either the interest rates and losses are grouped within the Prosper Rating. As we can see, the lower the Prosper rating, the higher the amount of yield could be earned, and also the higher the money lenders could lose. Therefore, simply looking at Prosper Rating we could firstly estimate the return and the risk lenders take. I would say that loans with higher returns also have a higher likelihood of default either.

## prosper_loan$status: Cancelled
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0975  0.1345  0.1950  0.1784  0.2325  0.2325 
## -------------------------------------------------------- 
## prosper_loan$status: Current or Paid
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.0100  0.1174  0.1660  0.1749  0.2299  0.4925 
## -------------------------------------------------------- 
## prosper_loan$status: Defaulted
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.0100  0.1605  0.2250  0.2210  0.2820  0.4800 
## -------------------------------------------------------- 
## prosper_loan$status: Past Due
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0499  0.1785  0.2299  0.2244  0.2816  0.3335

As I expected, defaulted loans have a higher median of Lender Yield than Cancelled/ Current and Paid loans. The median lender yield of the defaulted notes was 22,5%, just a tiny bit lower than Past due loans.

## 
##  Pearson's product-moment correlation
## 
## data:  OpenRevolvingMonthlyPayment and EmploymentStatusDuration
## t = 59.47, df = 106310, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1736111 0.1852463
## sample estimates:
##      cor 
## 0.179435

It is important to notice that those who stay longer in the workforce tend to acquire more debt and pay for it with much higher amount than young professionals.

## prosper_loan$Term: 12
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0300  0.0829  0.1334  0.1401  0.1964  0.2569 
## -------------------------------------------------------- 
## prosper_loan$Term: 36
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.0100  0.1174  0.1700  0.1834  0.2499  0.4925 
## -------------------------------------------------------- 
## prosper_loan$Term: 60
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0569  0.1390  0.1770  0.1830  0.2219  0.3204

Looking at this we can partly explain why much more investors opt for 3-month loans. The median of 3-year term rate is 3.67% higher than 1-year term rate, and the 5-year term has an exact same rate as a 3-year term. Obviously, a 3-year term is the most appropriate investment due to its returns and the amount of time we put money in.

## prosper_loan$LoanAmount.bucket: (0,5e+03]
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1500  0.2284  0.2193  0.2900  0.4975 
## -------------------------------------------------------- 
## prosper_loan$LoanAmount.bucket: (5e+03,1.5e+04]
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0100  0.1299  0.1710  0.1767  0.2195  0.3600 
## -------------------------------------------------------- 
## prosper_loan$LoanAmount.bucket: (1.5e+04,2.5e+04]
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1101  0.1400  0.1430  0.1725  0.3575 
## -------------------------------------------------------- 
## prosper_loan$LoanAmount.bucket: (2.5e+04,3.5e+04]
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0759  0.1153  0.1302  0.1293  0.1435  0.1819
## 
##  Pearson's product-moment correlation
## 
## data:  LoanOriginalAmount and BorrowerRate
## t = -117.58, df = 113935, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3341283 -0.3237719
## sample estimates:
##        cor 
## -0.3289599

From this picture, it appears that smaller loans yield much higher interest rate than the larger ones even though their correlation is not that strong. So far I didn’t know why but this figure is quite intriguing to put more investigation.

Almost all HR graders are filled with <5000 USD loan. The highest amount of loan allowed to issued (25,000-35,000 USD) only appear in grade A and grade B, loans in this range are reliable enough and yield more return than grade AA by all odds. No wonder why smaller loans correlate to higher borrower rate.

As we expected, 3-year term loan has a higher amount of default case compare to a 5-year term loan. Additionally, 3-year and 5-year term are more attractive than 1-year term due to their higher figure of lender yield.

Speaking of default possibility, We could see that the default number is typically high in the loan graded D, E, HR since they are considered as having more risk. Surprisingly, the highest amount of loan found defaulted is not from the highest risk HR but from those in grade D.

If you own a home, the home may very well your big-ticket asset, and the mortgage of that home might be your largest debt. If you rent a home, the monthly renting payment you pay is also a big deal of expense. As we can see, there is a small gap of difference between the number of people who own a house and those who don’t in 5-year loan term. That maybe are borrowers having a mortgage to pay monthly hence they are more likely to opt for longer duration p2p loans since a mortgage payment is considered a large debt.

Surprisingly, borrowers that do not own houses having more chance of default. Homeownership may be considered to be an indicator of financial responsibility and low credit risk. After doing some research, I found out that those who had been approved for a mortgage by a bank or really owning a house must have successfully demonstrated to have financial stability.

The majority loan has less than 15 accounts opened at a time. However, there is also a long tail on this distribution. The default rates are much higher for those with 5 or fewer account. Base on this analysis, we should consider very carefully to lend a borrower with less than 5 open accounts.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the  investigation. How did the feature(s) of interest vary with other features in  the dataset?

So far, I haven’t figured out any significant correlations of those factors above, except Prosper rating and Lender Yield/Estimated Loss. However, I found some interesting patterns between Lender Yield/ Borrower Rate with Term, Loan status with Loan Original Amount. Employment Status Duration is also a good figure to observe when it associates with Debt to Income ratio and Monthly Payment.

Did you observe any interesting relationships between the other features  (not the main feature(s) of interest)?

There are some stimulating patterns are found but there is no particular relationship that I saw in my exploration of other features.

What was the strongest relationship you found?

Prosper Rating is negatively strongly correlated with Lender Yield and Estimated Loss.

Multivariate Plots Section

## prosper_loan$ProsperRating..Alpha.: AA
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   62.75  150.00  178.37  274.00 1035.00 
## -------------------------------------------------------- 
## prosper_loan$ProsperRating..Alpha.: A
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    1.00   51.00   97.53  162.00 1189.00 
## -------------------------------------------------------- 
## prosper_loan$ProsperRating..Alpha.: B
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0     1.0    19.0    69.8   112.0   856.0 
## -------------------------------------------------------- 
## prosper_loan$ProsperRating..Alpha.: C
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    1.00    9.00   51.09   78.00 1024.00 
## -------------------------------------------------------- 
## prosper_loan$ProsperRating..Alpha.: D
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    4.00   35.00   56.23   87.00  511.00 
## -------------------------------------------------------- 
## prosper_loan$ProsperRating..Alpha.: E
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    2.00   25.00   35.01   55.00  279.00 
## -------------------------------------------------------- 
## prosper_loan$ProsperRating..Alpha.: HR
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    9.00   31.00   35.28   54.00  237.00 
## -------------------------------------------------------- 
## prosper_loan$ProsperRating..Alpha.: 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0    34.0    78.0   116.1   158.0   913.0

It appears that larger investors who look to fund the whole loan are more attracted to A-B-C range of Prosper rating. In the other hand, AA profiles appear to be a sweet spot for multiple investors seeking lower risk, lower yielding loan. While a large proportion of loans might be funded wholely today, this was not always the case. Before 2013, more than 90 percent of loans were funded by multiple investors. This dramatic shift is certainly due to the increasing interest in online lending and their desire to accumulate substantial portfolios.

This plot aims to indicate the relationship between Lender Yield and Prosper rating through loan amount. Lender Yield allocated separately by Prosper rating. The higher the risk, the smaller the amount of loan is allowed to issue. Borrowers graded A and B are those who acquire most of the maximum amount of loan. Default loans also concentrate on higher interest rate in return. The typically riskiest HR graders are only authorized to fill out loans under 5000 dollar.

While D, E and HR are the real high-risk and insecure types of loan, many investors seem to be interested in those categories. The plot area of loan amount which has less than 10,000 USD and less than 100 investors is quite crowded. It could be explained by the fact that riskier loans yield a higher amount of money in return. Furthermore, singular investment in which investors fund wholely for a loan spreads out for all amounts, and it seems to attract a large number of lenders in the market. Overall, investors seem not interested in sharing portfolios with too many people in high-risk investments.

The distribution of Lender Yield by Prosper Rating clearly changed through each year. Lender Yield is neatly distributed by each grade year over year, showing the advancement in controlling system.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the  investigation. Were there features that strengthened each other in terms of  looking at your feature(s) of interest?

Number of investors on each loan also associated with Prosper Rating and Lender Yield, less so concentrated portfolios are those having a high rate and dispersing for every number of DTI.

Were there any interesting or surprising interactions between features?

Yes, there were. As I found out, the higher the risk of a loan, the smaller the amount of loan allowed to issue. As we can see HR graders are only accepted for loans less than 5,000. On the other hand, larger loan requests relate to better rating loans.

Final Plots and Summary

Plot One

Description One

The graph above summarises Prosper loan activity from 2005 to 2014 shortly. From 2006 to 2009, Prosper determined loan rates using auction system. Following the SEC registration in 2009, the company created a new model that determined by a formula evaluating each prospective borrower’s credit risk “Prosper Rating”. That’s why we saw the prosper rating only appears from 2009 in the graph. Besides, Prosper sales volume drop significantly in 2009 due to the SEC event. 2013 witnessed a remarkable growth in Prosper sales due to their own improvements and the substantially increasing awareness among borrowers as well as the investors.

One thing is necessary to mark down here, data of the year 2014 only reflects the first three months Jan, Feb and Mar; that’s why the total amount the plot showing is a bit uncanny low. Yet the quantity for only three months of 2014 has been nearly reaching the total sales of the year 2012, and almost half of the total amount of the year 2013, which is pointing out that Prosper keeps growing substantially year after year.

Plot Two

Description Two

This plot shows the strong relationship between Prosper Rating and Lender Yield. Investors can take a look at this to understand how a portfolio works basically.

Plot Three

Description Three

Debt to Income Ratio also indicates a health of a portfolio. Looking at this plot we can have an overview of the Prosper Loan market, what factors are there that investors mostly care about and how the other put money in their investment.

Reflection

This project took much more time than I would expect in the first place. Approaching the dataset, I really had no idea of what information this data is delivering, every variable and every figure that show seem very strange to me since I have no experience working in financial industry. Realizing I need to gain some knowledge about finance and p2p lending platform first hand, I do a lot of research and read more about Prosper lending as well as Lending Club business, trying to comprehend the pattern behind the data, getting insight from this data and learn how to make a real investment as an investor.

The more I do research and the more I know how to calculate and predict the return on each loan issued by distinct variables, prosper data appears to make more sense to me. Each variable plays an important role as a whole picture. To really learn how to invest on a loan effectively, I believe the investors not only have to dive deep into the data provided, but also need to gain a lot more practical experience along the way.

To further investigate this dataset, I would like to learn more about the predictive model and try to make some predictions on the defaulted possibility of each loan through available observations.